
\begin{section}{The Modern Evolutionary Synthesis -- An Overview}

The motivation here is to study and attempt to explain observed phenomena such as adaptation and speciation in biological life by taking a quantitative, statistical approach. The ideas and methods exposed here underlie the modern evolutionary synthesis and the interesting results it has produced in the more general field of quantitative genetics.

The scope of this presentation is limited to the study of allele frequency distribution in a sexually reproducing population of organisms. For the sake of brevity, we are only going to look at the genotypic traits that emerge in this population over time and ignore the way they are expressed (or not expressed) as phenotypes. 

\end{section}

\begin{section}{A Simple Example}

Consider a population of crickets who reproduce randomly and live in an idealized environment where there is negligible competition for resources and which has no natural predators. Assume further that this population is bound to a closed geographic region with no natural barriers that would tend to divide the population, no migration of any organisms into this region or out of this region, and that the crickets are evenly divided throughout this region. A final assumption we make (for technical reasons) is that no two generations in this population will ever ``overlap'', that older crickets effectively die out once they reproduce. To put it simply, we are assuming these crickets live in an incredibly boring world where nothing is changing except for their genes (in well defined and statistically predictable ways).

To make the effects of genetic drift negligible, we assume the population is very large (i.e. biblical plague).

We assume:

\begin{itemize}
\item An infinitely large population of crickets.
\item Genetic Recombination does not occur with sexual reproduction.
\item All reproduction occurs within a generation, there is no reproductive generational overlap.
\item The rate of mutation is zero.
\item There are no selective pressures on the population.
\item There is no migration in or out of the population, or any natural barriers in the geographic region to cause speciation.
\end{itemize}

The reason we want this idealized world is not to model anything observable in nature, but to provide a null model that we can compare against empirical observations. We want to analyze the statistical patterns that emerge in distribution of alleles in this boring world so we can determine what interesting things are happening in the real world by looking at how our observations of nature don't fit to the trivial distribution.

Given a pair of alleles $(A,a)$ randomly taken from our population at generation $G_n$ with frequency $F_{n}(A,a) = (p,q)$ and $p + q = 1$. We formulate the hypothesis that under the conditions of our assumption (random mating, no mutation, no migration, large population size, and no selective pressure), that each allele present in our population at generation $G_{n+1}$ will be independently selected and pairwise random.

We recall from high school biology that the new alleles that can be formed from two ${\bf Aa}$ crickets reproducing is given by the matrix (or {\bf Punnet Square}).

\(
A_{n+1}(A,a) =
\begin{pmatrix}
{\bf AA} & {\bf Aa} \\
{\bf Aa} & {\bf aa} \\
\end{pmatrix}
\)

And given the known frequency distribution $F_{n}(A,a) = (p,q)$, we can compute the expected frequency of the possible alleles in $G_{n+1}$.

\[
F_{n+1}(A_{n+1}(A,a)) =
\begin{pmatrix}
F({\bf AA}) & F({\bf Aa}) \\
F({\bf Aa}) & F({\bf aa}) \\
\end{pmatrix}
=
\begin{pmatrix}
p*p & p*q \\
p*q & q*q \\
\end{pmatrix}
=
\begin{pmatrix}
p^2 & pq \\
pq & q^2 \\
\end{pmatrix}
\]

So we achieve the classic result that $F_{n+1}({\bf AA}) = p^2$, $F_{n+1}({\bf Aa}) = 2pq$ and $F_{n+1}({\bf aa}) = q^2$. These frequencies are called the {\bf Hardy-Weinberg Frequencies} of the alleles {\bf AA, Aa, aa}.

\end{section}

\begin{section}{The Hardy-Weinberg Principle}

Let's assume that our crickets are a mixed bag of genotypic traits, and we pick some allele from the genome where we observe $p$ purely dominant {\bf AA} crickets, $2q$ heterozygous {\bf Aa} crickets, and $r$ purely recessive {\bf aa} crickets.

So, we have the probabilistic sum $F(AA) + F(Aa) + F(aa) = p + 2q + r = (p+q) + (q+r) = 1$ given by the proportion $p:2q:r$, present in $G_0$ - the initial population.

This can be rewritten as $p_1 + 2q_1 + r_1 = (p + q)^2 + 2(p+q)(p+q) + (q+r)^2 = 1$, which represents the initial distribution of alleles in our population at $G_1$, the first generation after reproduction.

The conditions we assumed about this population can be expressed mathematically as the equivalence principle $${\bf E} = q_1^2 - p_1r_1 = ((p+q)(q+r))^2 - (p+q)^2(q+r)^2 = 0$$.

Now, compute the distribution for $G_2$.

 $$p_2 = (p_1 + q_1)^2 = p_1^2 + 2p_1q_1 + q_1^2 = (p+q)^2((p+q)^2 + 2(p+q)(q+r) + (q+r)^2)$$.

Since $((p+q)^2 + 2(p+q)(q+r) + (q+r)^2) = 1$, this reduces to $p_2 = (p+q)^2$ and it is clear that $p_2 = p_1$. Similarly, $q_2 = q_1$ and $r_2 = r_1$. Thus, the population is in equilibrium under the assumed conditions. In this case, reproduction maps each allele present in $G_n$ to its identity in $G_{n+1}$.


\end{section}

\begin{section}{The Wright-Fisher Model}

The Wright-Fisher Model describes the process of genetic drift in a population, and can be derived by slightly altering our previous set of assumptions. In this case, we have:

\begin{itemize}
\item A population of $N$ crickets (and a pool of $2N$ gametes).
\item All genetic information is preserved under reproduction (i.e. no sexual recombination).
\item No generational overlap.
\item Random mating, no sexual selection.
\item No mutations.
\item No selective pressures.
\end{itemize}

Again, we take the pair of alleles $(A,a)$ present in the population. The frequency distribtion of the alleles depend on the size of the population, with $F(A)=i$ and $F(a)=N-i$, where $N$ is the size of the population. 

We begin looking at our population at $G_0$ with $F(A)=p=i/N$ and $F(a)=1-F(A)=i-p$

The probability of findng a gene with $i$ copies in $G_n$ with $j$ copies in $G_{n+1}$ is given by the formula:

$$P_{ij} = \begin{pmatrix} N \\ j \end{pmatrix}p^j(1-p)^{N-j}$$

So we let the variable $K_t$ represent the count of the allele A in future generations, then $K_1$ is binomially distributed with parameters $N$, $p = i/n$ and starting with $K_0=i$.

Thus, the mean of $K_1$ is $E(K_1)=Np=i$ and the variance is $V(K_1)=Np(p-1)$.

Each variant of an allele will fluctuate around the mean for several iterations, and will ultimately terminate in one of two absorbing states - where $V(K_1) = 0$ (i.e. it goes extinct) or $V(K_1) = N$ (it is fixed throughout the entire population).

So, if we consider $N$ alleles that are admitted into the current generation from the previous one, each allele starts off with $i/N$ copies of itself in the gene pool, and each successive generation is similarly chosen by randomly picking $N$ alleles (out of the possible $2N$)

Performing $N$ trials, each with $p = i/N$ chance of success, the number of $A$ alleles in the next generation will be binomially distributed with paramters $(N, i/N)$.

\end{section}
